-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update intro and add link to preprint #80
Conversation
Reviewer's Guide by SourceryThis PR updates the README.md file to provide a more comprehensive and accurate description of the diverse_seq project, including its capabilities and performance characteristics. The changes replace the original brief introduction with a more detailed explanation and add a link to the project's preprint. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @GavinHuttley - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟡 Documentation: 1 issue found
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
`diverse_seq` provides tools for selecting a representative subset of sequences from a larger collection. It is an alignment-free method which scales linearly with the number of sequences. It identifies the subset of sequences that maximize diversity as measured using Jensen-Shannon divergence. `diverse_seq` provides a command-line tool (`dvs`) and plugins to the Cogent3 app system (prefixed by `dvs_`) allowing users to embed code in their own scripts. The command-line tools can be run in parallel. | ||
`diverse-seq` implements computationally efficient alignment-free algorithms that enable efficient prototyping for phylogenetic workflows. It can accelerate parameter selection searches for sequence alignment and phylogeny estimation by identifying a subset of sequences that are representative of the diversity in a collection. We show that selecting representative sequences with an entropy measure of *k*-mer frequencies correspond well to sampling via conventional genetic distances. The computational performance is linear with respect to the number of sequences and can be run in parallel. Applied to a collection of 10.5k whole microbial genomes on a laptop took ~8 minutes to prepare the data and 4 minutes to select 100 representatives. `diverse-seq` can further boost the performance of phylogenetic estimation by providing a seed phylogeny that can be further refined by a more sophisticated algorithm. For ~1k whole microbial genomes on a laptop, it takes ~1.8 minutes to estimate a bifurcating tree from mash distances. | ||
|
||
You can read more about the methods implemented in `diverse_seq` in the preprint [here](https://biorxiv.org/cgi/content/short/2024.11.10.622877v1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (documentation): Package name is inconsistently written as both diverse-seq
and diverse_seq
Please standardize the package name throughout the documentation to avoid confusion.
Pull Request Test Coverage Report for Build 11785928718Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Summary by Sourcery
Update the README to enhance the introduction of
diverse_seq
and include a link to a relevant preprint.Documentation:
diverse_seq
tool, highlighting its alignment-free algorithms and their application in phylogenetic workflows.diverse_seq
.